Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Administrator, Curator, or StaffJeff Kaplan Date: Aug 29, 2012 4:36pm
Forum: web Subject: Re: Can you please add us to your directory

from the FAQ at http://archive.org/about/faqs.php

Much of our archived web data comes from our own crawls or from Alexa Internet's crawls. Neither organization has a "crawl my site now!" submission process. Internet Archive's crawls tend to find sites that are well linked from other sites. The best way to ensure that we find your web site is to make sure it is included in online directories and that similar/related sites link to you.

Alexa Internet uses its own methods to discover sites to crawl. It may be helpful to install the free Alexa toolbar and visit the site you want crawled to make sure they know about it.

Regardless of who is crawling the site, you should ensure that your site's 'robots.txt' rules and in-page META robots directives do not tell crawlers to avoid your site.

When a site is crawled, there is usually at least a 6-month lag, and sometimes as much as a 24-month lag, between the date that web pages are crawled and when they appear in the Wayback Machine.

In some cases, crawled content from certain projects may appear in a much shorter timeframe — as little as a few weeks from when it was crawled. Older material for the same pages and sites may still appear separately, months later.